Example Notebook (Adult Census Income Dataset)

Load a dataset

For this example we are going to use 'Adult Census Dataset'. It consists of both categorical and numerical features.

Preprocess the dataset

There are values in the dataset that are unknown (\?). In this step all rows containing such values are going to be removed.

Visualize the dataset

Three visualization functions offered by the XAI module will be used for analyzing the dataset.

Target

In the cell below the target variable is selected. In this example we will use the column loan as target variable, which shows whether a person earns more than 50k (>50K | <=50K) per year.

Training the models

Four models are going to be trained on this dataset. In the output below we can see accuracy, classification reports, confusion matrix and ROC Curve for each model.

Global model interpretations

In the following steps we will use global interpretation techniques that help us to answer questions like how does a model behave in general? What features drive predictions and what features are completely useless. This data may be very important in understanding the model better. Most of the techniques work by investigating the conditional interactions between the target variable and the features on the complete dataset.

Feature importance

The importance of a feature is the increase in the prediction error of the model after we permuted the feature’s values, which breaks the relationship between the feature and the true outcome. A feature is “important” if permuting it increases the model error. This is because in that case, the model relied heavily on this feature for making right prediction. On the other hand, a feature is “unimportant” if permuting it doesn’t affect the error by much or doesn’t change it at all.

ELI5

In the first case, we use ELI5, which does not permute the features but only visualizes the weight of each feature.